Using Algorithmic Attribution Techniques to Determine Authorship in Unsigned Judicial Opinions

نویسندگان

  • William Li
  • Pablo Azar
  • David Larochelle
  • Phil Hill
  • James Cox
  • Robert C. Berwick
  • Andrew W. Lo
چکیده

This Article proposes a novel and provocative analysis of judicial opinions that are published without indicating individual authorship. Our approach provides an unbiased, quantitative, and computer scientific answer to a problem that has long plagued legal commentators. * William Li is a PhD student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and a 2012 graduate of the Technology and Policy Program at the Massachusetts Institute of Technology (MIT). * Pablo Azar is a PhD student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT). * David Larochelle is an engineer at the Berkman Center for Internet & Society at Harvard University. * Phil Hill is a Fellow at the Berkman Center for Internet & Society at Harvard University and a 2013 J.D. Candidate at Harvard Law School. * James Cox was an associate with Jenner & Block LLP during drafting of this Article, and currently serves as an attorney for the United States government. * Robert C. Berwick is Professor of Computational Linguistics and Computer Science and Engineering in the Departments of Electrical Engineering and Computer Science and Brain and Cognitive Sciences, MIT. * Andrew W. Lo is the Charles E. and Susan T. Harris Professor at the MIT Sloan School of Management, Principal Investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT), and a joint faculty in the MIT Electrical Engineering and Computer Science Department. † We thank John Cox at MIT, Andy Sellars and Ryan Budish at the Berkman Center, and Philip C. Berwick at the Washington University in St. Louis Law School for their invaluable feedback, and Jayna Cummings for editorial assistance. 504 STANFORD TECHNOLOGY LAW REVIEW [Vol. 16:485 United States courts publish a shocking number of judicial opinions without divulging the author. Per curiam opinions, as traditionally and popularly conceived, are a means of quickly deciding uncontroversial cases in which all judges or justices are in agreement. Today, however, unattributed per curiam opinions often dispose of highly controversial issues, frequently over significant disagreement within the court. Obscuring authorship removes the sense of accountability for each decision’s outcome and the reasoning that led to it. Anonymity also makes it more difficult for scholars, historians, practitioners, political commentators, and—in the thirty-nine states with elected judges and justices—the electorate, to glean valuable information about legal decisionmakers and the way they make their decisions. The value of determining authorship for unsigned opinions has long been recognized but, until now, the methods of doing so have been cumbersome, imprecise, and altogether unsatisfactory. Our work uses natural language processing to predict authorship of judicial opinions that are unsigned or whose attribution is disputed. Using a dataset of Supreme Court opinions with known authorship, we identify key words and phrases that can, to a high degree of accuracy, predict authorship. Thus, our method makes accessible an important class of cases heretofore inaccessible. For illustrative purposes, we explain our process as applied to the Obamacare decision, in which the authorship of a joint dissent was subject to significant popular speculation. We conclude with a chart predicting the author of every unsigned per curiam opinion during the Roberts Court. INTRODUCTION....................................................................................................... 505 I. UNSIGNED OPINIONS ........................................................................................ 505 A. Historical Context of Unsigned Opinions .................................................. 506 B. Problems with Unsigned Opinions ............................................................. 508 C. Solving Attributional Questions the Old-Fashioned Way........................... 509 D. Solving Attributional Questions Algorithmically........................................ 510 II. TEST CASE: OBAMACARE................................................................................... 511 III. EXPERIMENTAL SETUP ..................................................................................... 514 A. Experimental Questions ............................................................................. 514 B. Data Preparation ....................................................................................... 515 C. Machine Learning System Overview .......................................................... 515 D. Design of Authorship Attribution System ................................................... 516 1. Document Representation ................................................................... 517 2. Model Selection ................................................................................... 518 3. Feature Selection ................................................................................ 520 IV. EMPIRICAL RESULTS AND DISCUSSION ............................................................. 522 A. Feature Sets and Classification Models ..................................................... 522 B. Comparison of Feature Selection Models .................................................. 522 C. Interpreting Authorship Attribution Model Scores ..................................... 523 D. Insights on Writing Styles ........................................................................... 524 E. Controlling for Clerks ................................................................................ 525 F. Authorship Prediction for Sebelius ............................................................ 526 G. Comparison to Predictions by Domain Experts ......................................... 527 H. Section-by-Section Analysis ....................................................................... 528 V. AUTHORSHIP PREDICTIONS FOR PER CURIAM OPINIONS OF THE ROBERTS COURT .............................................................................................................. 529 CONCLUSION .......................................................................................................... 533 Spring 2013] ALGORITHMIC ATTRIBUTION 505

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English Text Classification by Authorship and Date

We performed two experiments with statistical techniques for classifying documents by date and author, using large bodies of publicly-available texts. In one experiment, we produced a Markov chain of every United States Supreme Court opinion ever written, and evaluated its ability to classify American judicial opinions by decade of authorship. In the other, we examined the performance of two se...

متن کامل

Mixture of Experts Authorship Attribution Notebook for PAN at CLEF 2012

For problems A, B, C, D, I, and J we used three Authorship Attribution techniques; a distance based nearest neighbor, a svm, and method that used a distanced based NN approach to classify sections of a document and classifying based on who wrote majority of the document. These three techniques were then considered experts and each given a vote to determine the author of each document. For probl...

متن کامل

Quantitative Authorship Attribution: An Evaluation of Techniques

The basic assumption of quantitative authorship attribution is that the author of a text can be selected from a set of possible authors by comparing the values of textual measurements in that text to their corresponding values in each possible author’s writing sample. Over the past three centuries, many types of textual measurements have been proposed, but never before have the majority of thes...

متن کامل

Domain Independent Authorship Attribution without Domain Adaptation

Automatic authorship attribution, by its nature, is much more advantageous if it is domain (i.e., topic and/or genre) independent. That is, many real world problems that require authorship attribution may not have in-domain training data readily available. However, most previous work based on machine learning techniques focused only on in-domain text for authorship attribution. In this paper, w...

متن کامل

Application of Information Retrieval Techniques for Source Code Authorship Attribution

Authorship attribution assigns works of contentious authorship to their rightful owners solving cases of theft, plagiarism and authorship disputes in academia and industry. In this paper we investigate the application of information retrieval techniques to attribution of authorship of C source code. In particular, we explore novel methods for converting C code into documents suitable for retrie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013